# We will use the following packages. # If needed, install them : pak::pkg_install(). stopifnot(require("corrr"),require("magrittr"),require("lobstr"),require("ggforce"),require("gt"),require("glue"),require("skimr"),require("patchwork"), require("tidyverse"),require("ggfortify")# require("autoplotly"))
Compute, display and comment the sample correlation matrix
Display jointplots for each pair of variables
Singular Value Decomposition (SVD)
NoteQuestion
Project the swiss dataset on the covariates (all columns but Fertility)
Center the projected data using matrix manipulation
Center the projected data using dplyr verbs
Compare the results with the output of scale() with various optional arguments
Call the centered matrix Y
NoteQuestion
Check that the ouput of svd(Y) actually defines a Singular Value Decomposition.
NoteQuestion
Relate the SVD of \(Y\) and the eigen decomposition of \(Y^\top \times Y\)
Perform PCA on covariates
NoteQuestion
Pairwise analysis did not provide us with a clear and simple picture of the French-speaking districts.
PCA (Principal Component Analysis) aims at exploring the variations of multivariate datasets around their mean (center of inertia). In the sequel, we will perform PCA on the matrix of centered covariates, with and without standardizing the centered columns.
Base R offers prcomp(). Call prcomp() on the centered covariates
Note that R also offers princomp
NoteQuestion
Check that prcomp() is indeed a wrapper for svd().
NoteQuestion
Check that rows and columns of component rotation of the result of prcomp() have unit norm.
NoteQuestion
Check Orthogonality of \(V\) (component rotation of the prcomp object)
NoteQuestion
Make a scatterplot from the first two columns of the \(x\) component of the prcomp object.
NoteQuestion
Define a graphical pipeline for the screeplot.
Hint: use function tidy() from broom, to get the data in the right form from an instance of prcomp.
NoteQuestion
Define a function that replicates autoplot.prcomp()
Project the dataset on the first two principal components (perform dimension reduction) and build a scatterplot. Colour the points according to the value of original covariates.
Hint: use generic function augment from broom.
NoteQuestion
Apply broom::tidy() with optional argument matrix="v" or matrix="loadings" to the prcomp object.
Comment.
NoteQuestion
Build the third SVD plot, the so called correlation circle.
NoteQuestion
Compute PCA after standardizing the columns, draw the correlation circle.
Compare standardized and non-standardized PCA
NoteQuestion
Pay attention to the correlation circles.
How well are variables represented?
Which variables contribute to the first axis?
NoteQuestion
Explain the contrast between the two correlation circles.
In the sequel we focus on standardized PCA.
Provide an interpretation of the first two principal axes
NoteQuestion
Which variables contribute to the two first principal axes?
NoteQuestion
Analyze the signs of correlations between variables and axes?
Add the Fertility variable
NoteQuestion
Plot again the correlation circle using the same principal axes as before, but add the Fertility variable.
How does Fertility relate with covariates? with principal axes?
Biplot
NoteQuestion
The last svd plot (biplot) consists of overlaying the scatter plot of component x of the prcomp object and the correlation circle.
So the biplot is a graphical object built on two dataframes derived on components x and rotation of the prcomp objects.
Design a graphical pipeline.
NoteQuestion
autoplot.prcomp() has optional arguments. If set to True, logical argument loadings overlays the scatterplot defined by the principal components with the correlation circle.
Generics
autoplot() is an example of S3 generic function. Let us examine this function using sloop
Use sloop::s3_dispatch() to compare autoplot(prcomp(swiss)) and autoplot(lm(Fertility ~ ., swiss))
Use sloop::s3_getmethod() to see the body of autoplot.prcomp